Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays
نویسنده
چکیده
منابع مشابه
Practical aspects of Compressed Suffix Arrays and FM-Index in Searching DNA Sequences
Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compress...
متن کاملIndexing huge genome sequences for solving various problems.
Because of the increase in the size of genome sequence databases, the importance of indexing the sequences for fast queries grows. Suffix trees and suffix arrays are used for simple queries. However these are not suitable for complicated queries from huge amount of sequences because the indices are stored in disk which has slow access speed. We propose storing the indices in memory in a compres...
متن کاملSuffix arrays: what are they good for?
Recently the theoretical community has displayed a flurry of interest in suffix arrays, and compressed suffix arrays. New, asymptotically optimal algorithms for construction, search, and compression of suffix arrays have been proposed. In this talk we will present our investigations into the practicalities of these latest developments. In particular, we investigate whether suffix arrays can ind...
متن کاملBroadword Implementation of Rank/Select Queries
Research on succinct data structures (data structures occupying space close to the informationtheoretical lower bound, but achieving speed similar to their standard counterparts) has steadily increased in the last few years. However, many theoretical constructions providing asymptotically optimal bounds are unusable in practise because of the very large constants involved. The study of practica...
متن کاملDirectly Addressable Variable-Length Codes
We introduce a symbol reordering technique that implicitly synchronizes variable-length codes, such that it is possible to directly access the i-th codeword without need of any sampling method. The technique is practical and has many applications to the representation of ordered sets, sparse bitmaps, partial sums, and compressed data structures for suffix trees, arrays, and inverted indexes, to...
متن کامل